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Abstract 

Studying how genetic predispositions come together with environmental factors to contribute to complex behavioral 
outcomes has great potential for advancing the understanding of the development of psychopathology. It represents 
a clear theoretical advance over studying these factors in isolation. However, research at the intersection of multiple 
fields creates many challenges. We review several reasons why the rapidly expanding candidate gene-environment 
interaction (cGxE) literature should be considered with a degree of caution. We discuss lessons learned about 
candidate gene main effects from the evolving genetics literature and how these inform the study of cGxE. We 
review the importance of the measurement of the gene and environment of interest in cGxE studies. We discuss 
statistical concerns with modeling cGxE that are frequently overlooked. Furthermore, we review other challenges that 
have likely contributed to the cGxE literature being difficult to interpret, including low power and publication bias. 
Many of these issues are similar to other concerns about research integrity (e.g., high false-positive rates) that have 
received increasing attention in the social sciences. We provide recommendations for rigorous research practices for 
cGxE studies that we believe will advance its potential to contribute more robustly to the understanding of complex 
behavioral phenotypes. 
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in complex behavioral outcomes. Following the publica- 
tion of cGxE studies in high-profile scientific journals 
(Caspi et al., 2002, 2003), in the last decade researchers 
have witnessed an explosion of interest in this area. 
There has been an exponential increase in the number of 
cGxE studies published, with researchers from diverse 
backgrounds routinely incorporating cGxE components 
into their studies. However, there has been growing 
skepticism about the replicability of many of these 
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There have been radical shifts as to belief about whether 
human behavior is more strongly determined by genes or 
by environment over the course of scientific history. 
Researchers in different fields seemingly advocated for 
the importance of one over the other (the so-called 
nature vs. nurture debate), with some camps studying 
genetic influence and others studying environmental fac- 
tors. It is now widely accepted that both genetic and 
environmental influences are important, and characteriz- 
ing how these influences come together to impact out- 
come, that is, the study of gene-environment interaction 
(GxE), has become an important area of study across 
multiple disciplines. That said, few research topics have 
generated more controversy and less clarity than the 
study of candidate gene-environment interaction (cGxE) 
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findings (e.g., Risch et al., 2009) and increasing concern 
about the quality of this rapidly expanding literature. 

This concern led the National Institute on Alcohol 
Abuse and Alcoholism to sponsor a workshop in January 
2013 that brought together a small group of researchers 
to discuss these challenges and to provide recommenda- 
tions for how to move the field forward. Those discus- 
sions formed the foundation for this article, in which we 
review a number of reasons why the existing cGxE litera- 
ture should be considered with a degree of caution. This 
is not to imply that true discoveries are absent in the lit- 
erature. However, there are reasons to be concerned 
about the methods used and the conclusions drawn from 
many cGxE studies. Drawing from accumulating findings 
in psychiatric genomics, 1 we consider potential pitfalls 
and logical inconsistencies with some of the extant cGxE 
literature. We discuss ways of refining the development 
of cGxE hypotheses, conducting statistically rigorous 
analyses, and interpreting findings within the broader 
context of genetics research — all directions that we 
believe hold promise for advancing the potential of cGxE 
studies to contribute more robustly to the understanding 
of complex behavioral phenotypes. 

History 

The idea that genetic or biological predispositions are likely 
to interact with environmental factors to contribute to psy- 
chiatric and substance use disorders has been entertained 
for quite some time (Whytt, 1765). Long before it was fea- 
sible and cost-efficient to measure specific genes, research- 
ers of twin studies documented that the importance of 
overall genetic influences (i.e., heritability) could vary 
considerably as a function of measured environmental fac- 
tors (Kendler & Eaves, 1986). For instance, Kendler and 
colleagues found that people at highest genetic risk for 
depression (i.e., individuals with an identical twin with a 
history of depression) were significantly more likely than 
individuals not carrying a genetic predisposition to have an 
onset of the disorder in the presence of exposure to a 
severe stressful life event, suggesting that genetic factors 
influence the risk for major depression in part by altering 
individual sensitivity to the depression-inducing effects of 
stressful life events (e.g., Kendler et al., 1995). With meth- 
odological advances that allowed twin researchers to model 
how genetic influences change as a function of the envi- 
ronment (T. M. Button et al., 2009), studying GxE became 
a popular area of research in behavior genetics (T. M. 
Button, Lau, Maughan, & Eley, 2008; Dick, Bernard, et al., 
2009; Dick, Pagan, Holliday, et al, 2007; Dick, Pagan, Viken, 
et al., 2007; Dick, Rose, Viken, Kaprio, & Koskenvuo, 2001; 
Dick, Viken, et al., 2007; Harden, Hill, Turkheimer, & 
Emery, 2008; Purcell, 2002; R. J. Rose, Dick, Viken, & 
Kaprio, 2001; South & Krueger, 2008). 



The accumulating body of research has demonstrated 
that the importance of genetic influences can vary dra- 
matically as a function of environmental context; alterna- 
tively phrased, the importance of environmental 
influences can vary dramatically as a function of genetic 
factors. For example, it has been demonstrated that 
genetic influences on adolescent substance use and 
externalizing behavior are far stronger under conditions 
of low parental monitoring (Dick, Pagan, Viken, et al., 
2007; Dick, Viken, et al., 2007); high peer deviance (T. M. 
Button et al., 2009; Dick, Pagan, Holliday et al., 2007; 
Dick, Pagan, Viken, et al., 2007; Harden et al., 2008); and 
state, school, and neighborhood conditions that provide 
reduced social monitoring and enhanced opportunity to 
use (Boardman, 2009; Dick, Bernard, et al., 2009; Dick 
et al., 2001; R. J. Rose et al., 2001). However, this body of 
GxE research did not gain widespread recognition out- 
side the field of twin research. It was not until the 
influential Science publication by Caspi et al. (2003), 
attributing part of the genetic sensitivity to the depres- 
sogenic effects of stressful life events to variations in a 
specific DNA sequence (a polymorphism in the sero- 
tonin-transporter-linked polymorphic region [5-HTTLPR]), 
that GxE research became a widely recognized area of 
study outside the field of behavior genetics. However, an 
important distinction arose between the GxE work con- 
ducted in the field of behavior genetics and the wide- 
spread adoption of GxE by other fields, particularly the 
social sciences. Historically, the research conducted by 
behavior geneticists focused on "latent" genetic influ- 
ences. This means that the importance of genetic factors 
is estimated statistically by phenotypic similarity across 
individuals with different degrees of genetic and environ- 
mental sharing, with methodologies such as family, twin, 
and adoption studies (Bergeman & Plomin, 1989). Using 
this method, researchers estimate the overall importance 
of genetic effects on a phenotype, that is, the total contri- 
bution of all genes influencing the phenotype. GxE in 
this context means that the overall importance of genetic 
variance differs across environments. In contrast, most 
GxE researchers in fields outside behavior genetics have 
studied measured candidate genes. In these studies, 
researchers test whether the association of a specific 
genetic variant with a given outcome varies across differ- 
ent environments. We refer to these studies as cGxE, and 
they are the focus of this review. 

The publication of several high-profile cGxE studies 
(e.g., MAOA x Maltreatment in Antisociality; Caspi et al., 
2002), as well as the technological advances in genetics 
that made genotyping accessible and cost-efficient, likely 
contributed to the dramatic increase in cGxE research. 
Regardless of the validity and reproducibility of those ini- 
tial efforts (which continues to be debated; Brown & 
Harris, 2008; Clarke, Flint, Attwood, & Munafo, 2010; 
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Culverhouse et al., 2013; Karg, Burmeister, Shedden, & 
Sen, 2011; Munafo, Durrant, Lewis, & Flint, 2009; Risch 
et al., 2009), they left their mark on the field by creating 
widespread recognition of the potential importance of 
the interplay between genetic and environmental factors 
in developmental pathways underlying the etiology of 
behavioral outcomes in ways that the latent GxE work of 
behavior geneticists had failed to do. In the excitement 
surrounding the initial cGxE findings, and spurred by 
funding initiatives that encouraged research in this area, 
investigators from disparate backgrounds incorporated 
measured genotypes into their studies. In the wake of 
historical tension surrounding the relative importance of 
genetic versus environmental effects (the so-called nature 
vs. nurture debate), GxE provided a conciliatory frame- 
work that could facilitate a synthesis of scientific fields 
that had historically been at odds with one another. 

However, early enthusiasm for cGxE findings has 
waned as the number of failures to replicate original find- 
ings mounted. In many ways, the progression of cGxE 
studies has closely paralleled the trajectory of studies of 
the main effects of candidate genes, with early enthusi- 
asm and adoption of genotyping candidate genes giving 
way to a literature plagued by small studies, failures to 
replicate, and a proliferation of novel findings with effect 
sizes that appeared at odds with what was subsequently 
found with well-powered studies (Neiswanger, Kaplan, & 
Hill, 1995). However, the study of genetic main effects 
has advanced dramatically since the early days of candi- 
date gene research. We believe that what has been 
learned about the genetics of complex behavior from 
studies of genetic main effects yields insights into previ- 
ous cGxE studies and ways to improve such research in 
the future. We begin by providing a broad review of 
developments in the field of psychiatric genetics over the 
past decade and discuss how this knowledge can inform 
studies of cGxE. 

Early statistical genetics: Linkage and 
candidate gene studies 

The field of statistical genetics, focused on finding genes 
that contribute to behavioral outcomes and disorders, has 
undergone rapid advances over the past decade. As 
researchers' knowledge about genetics has progressed, 
so too have the methodologies favored for gene identifi- 
cation (see Fig. 1 for an overview of common gene- 
finding strategies). Linkage and candidate gene studies 
were early gene identification strategies, believed to have 
complementary strengths. 

Using linkage studies, researchers agnostically scanned 
the mapped genome by looking for chromosomal 
regions that were shared among affected family mem- 
bers (suggesting there was a gene in that region that 



contributed to the disorder). The advantage of linkage 
was that it did not require any a priori knowledge of the 
underlying biology of the outcome, in theory making it 
possible to discover new genes involved in the outcome 
that could expand the understanding of the biology of 
the disorder. Linkage studies were used to successfully 
identify many genes that contributed to Mendelian dis- 
orders, in which a single gene following a straightfor- 
ward inheritance pattern with a major impact on outcome 
was present (Gusella et al., 1983; Murray et al., 1982; 
Tsui et al, 1985). However, linkage methods were less 
successful when applied to complex behavioral out- 
comes in which many genes are likely to be involved, 
each having just a small effect on the behavior, along 
with the environment. 

In contrast to the hypothesis-free linkage approach, 
researchers using candidate gene studies focused on 
genes, and variants within those genes, that were hypoth- 
esized to have biological relevance to the outcome of 
interest. In this way, candidate gene studies had the 
advantage of being more precise than linkage studies in 
that they had the potential to pinpoint specific genes or 
genetic variants, rather than just specific chromosomal 
regions. However, they relied on the investigator to cor- 
rectly "guess" what genes were biologically relevant to 
the outcome. Despite thousands of candidate gene pub- 
lications on behavioral phenotypes, the approach remains 
controversial, and very few candidate gene findings are 
widely accepted within the genetics community. Over 
time, it has become clear that the genetic architecture of 
behavioral traits is highly complex, that effect sizes of 
genetic polymorphisms are likely to be small (as dis- 
cussed later), and that scientists' ability to predict a priori 
which genes are likely to be relevant to a behavioral out- 
come has been very poor (Bosker et al., 2011; Colhoun, 
McKeigue, & Davey Smith, 2003; Need et al., 2009; 
Sullivan et al., 2008). 

Later statistical genetics: Genome-wide 
association studies (GWAS) 

As the cost of genotyping dropped during the 2000s, it 
became possible to conduct association tests, as are done 
in candidate gene studies, but across the entire genome. 
Such GWAS are hypothesis-free as with linkage studies, 
but they have much higher power to detect small effects 
of common variants. In GWAS, hundreds of thousands to 
millions of genetic markers known as single nucleotide 
polymorphisms (SNPs) are genotyped across the genome 
in an attempt to identify common variants that are asso- 
ciated with a particular outcome (disorder, behavior, 
etc.), suggesting that a particular genetic variant (or one 
very nearby) contributes to the outcome. In a sense, 
GWAS combine the advantages of linkage studies 
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Fig. 1. Overview of common gene-finding strategies. LOD = logarithm of the odds. 



(agnostic screening across the genome) and candidate 
gene studies (more precise localization of the gene/ 
genetic variant). 

GWAS were made possible by the rapid discovery of 
many more genetic variants through the International 
HapMap Project (http://hapmap.ncbi.nlm.nih.gov/), which 
had the goal of developing a public resource to catalogue 
normally occurring genome-wide variation in SNPs by 
annotating the sample genomes of small groups of ethni- 
cally homogenous individuals (e.g., Caucasians of European 
descent are represented by Utah residents with ancestry 
from northern and western Europe collected in 1980) to 



create what is known as a reference panel. In addition 
to providing a database for identification and comparison 
of the polymorphic nature of SNPs, HapMap also allows 
for the examination of the extent to which neighboring 
SNPs are correlated with each other via a population 
genetics process called linkage disequilibrium (LD). 
When two or more SNPs are in high LD (e.g., correlations 
> 0.8), if one of the SNPs is genotyped, then the genotype 
at the others can be probabilistically inferred via imputa- 
tion. The ability to infer surrounding genotypes on the 
basis of LD patterns means that researchers can now cost- 
efficiently scan the genome with a much smaller subset of 
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markers than previously needed (e.g., in the range of 
370,000-600,000) and impute the remaining commonly 
occurring markers across the genome. Whereas early 
attempts to impute variation were restricted to common 
SNPs (>5% minor allele frequency), the landmark 1000 
Genomes Project (http://www.1000genomes.org/) used a 
similar strategy to identify less common SNPs (<1%). 
Researchers' growing knowledge of genetic variants 
across the genome, coupled with exponential decreases 
in costs of high-density GWAS arrays (>1 million SNPs at 
<$100 per participant currently), now allows investigators 
to interrogate more than 10 million polymorphisms in 
relation to outcomes. 

Two primary points have become apparent over the 
last several years from GWAS: (a) The effect sizes associ- 
ated with individual genetic variants are very small, 
usually with odds ratios on the order of 1.1, and (b) 
researchers' ability to select a priori which genes are via- 
ble candidates for psychiatric and substance use disor- 
ders has been poor (Kendler, 2013; Sullivan, Daly, & 
O'Donovan, 2012). There are rare exceptions, such as the 
role of alcohol dehydrogenase genes in alcohol depen- 
dence (Shen et al., 1997). Researchers now realize that 
early candidate gene studies, as well as early atheoretical 
systematic gene-finding efforts (such as linkage studies), 
were underpowered to detect genes with the small effect 
sizes that more recent studies suggest are likely to be 
realistic. Although GWAS were more successful for some 
conditions (Crohn's, diabetes, macular degeneration; 
Manolio & Collins, 2009), like linkage studies, early 
GWAS were largely unsuccessful in the area of psychiat- 
ric and substance use disorders (Kendler, 2013). Few 
SNPs were detected that met genome-wide levels of 
significance. As increasingly large sample sizes have 
been procured, researchers now know that most early 
GWAS were simply underpowered (Visscher, Brown, 
McCarthy, & Yang, 2012). Amassing sample sizes through 
large consortia on the order of tens of thousands or more 
has revealed that the number of significant findings 
increases as the sample sizes increase. 

The discoveries from these large GWAS are robust and 
replicable, and for certain phenotypes, the amount of vari- 
ance explained in total from genome-wide significant 
SNPs is becoming nontrivial (e.g., 60% for Type 1 diabetes, 
10% for height; Visscher et al., 2012). In the area of psychi- 
atric genetics, studies of the genetic basis of schizophrenia 
are currently enjoying the most success, in which sample 
sizes on the order of >13,800 cases and >18,000 controls 
have now been accumulated, leading to the detection of 
>3,500 loci in 12 genomic regions that contribute to the 
disorder (Ripke et al., 2013). It is noteworthy, however, 
that even with such impressive sample sizes, Ripke et al. 
(2013) acknowledged in their landmark study (which is 
undergoing a further growth in sample size) the lack of 
power to detect genotype relative risks < 1.1. 



Reasons to Be Concerned About the 
Published cGxE Literature 

The emerging genetics literature suggests that the small 
sample sizes (e.g., n < 1,000), typical of candidate gene 
studies to date, are likely to be grossly underpowered for 
detecting genetic influences with small effect sizes. (The 
reason that many, perhaps most, researchers using candi- 
date gene studies have reported positive associations 
despite such lack of power is discussed later.) Some of 
the confusion about expected effect sizes and the sample 
sizes needed for cGxE studies may surround differences 
in the conceptualization of GxE effects across different 
fields (see Fig. 2 for an illustration of this). There are two 
ways that cGxE studies can be conceptualized: (a) as a 
genotype moderating the association between an envi- 
ronmental factor and an outcome (i.e., increasing expo- 
sure to major stressful life events is more strongly 
associated with an increased risk for depression in the 
presence of the short allele of 5-HTTLPR — often the 
default conceptualization for psychologists) or (b) as an 
environment moderating the association between a gen- 
otype and an outcome (i.e., the association between the 
short allele of 5-HTTLPR and depression is strongest in 
individuals experiencing major stressful life events — 
often the default conceptualization for geneticists). 
Statistically, these are equivalent and indistinguishable, 
but they can lead to different interpretations of the same 
data and different expectations about the likelihood of 
detecting a GxE effect. Conceptualizing GxE as a genetic 
effect (on an environment-behavior association) may 
lead one to assume small effect sizes on the basis of the 
growing GWAS literature demonstrating that genetic 
effects on complex outcomes generally have very small 
effect sizes. Conceptualizing GxE as an environmental 
effect on a gene-behavior association may lead one to 
assume larger effects sizes. Nevertheless, under either 
interpretation, the effect size of the candidate gene is 
critical. Recognizing the modest main effect sizes pro- 
duced by individual candidate gene polymorphisms, it 
may be overly optimistic to presume that the effect sizes 
associated with cGxE will be systematically larger. This is 
clearly a matter of some debate, as if certain types of 
cross-over interactions were prevalent, it would be pos- 
sible. Nevertheless, the lessons we have learned from the 
history of gene finding underscore the need for research- 
ers to be cognizant of the strong possibility that they are 
dealing with small effect sizes (or to provide strong justi- 
fication for why larger effect sizes are expected) and to 
demonstrate that their samples are adequately powered. 

In addition to what GWAS have taught researchers 
about genetic effect sizes, GWAS have also been informa- 
tive as to the likelihood that a hypothesized candidate 
gene will be associated with the hypothesized outcome. 
Robust and replicable GWAS signals (e.g., CACNA1C for 
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Fig. 2. Alternative portrayals of a Candidate Genotype (in this example, a single nucleotide polymorphism) x Environment (in this example, 
high vs. low adversity) interaction. The left panel shows that the strongest impact of high adversity on psychopathology is found for individuals 
with the GG genotype. Alternatively, the right panel shows that the strongest association of the genotype with psychopathology is among those 
experiencing high adversity. These are alternate presentations of the same interaction. 



schizophrenia, Ripke et al., 2013, and bipolar disorder, 
Ruderfer et al., 2014) have tended to be distinct from 
those that were routinely hypothesized in candidate gene 
studies (e.g., COMT, MAOA; Craddock, Dave, & Greening, 
2001), casting considerable doubt regarding the burden 
of a priori evidence for these selections. With rare excep- 
tions (e.g., rsl6969968 in the CHRNA5-CHRNA3-CHRNB4 
cluster, associated at p < 10-70 across several GWAS, was 
initially posited as a candidate gene; Tobacco and 
Genetics Consortium, 2010), widely studied candidate 
genes were not found to be significant when studied sys- 
tematically across the backdrop of the genome in well- 
powered studies (Bosker et al., 2011; Collins et al., 2012; 
Lasky-Su et al., 2008). In fact, some of the findings to 
emerge from GWAS in other areas, such as Crohn's dis- 
ease, have suggested new pathways that were not previ- 
ously suspected to play a role in the disorder and that 
drastically changed the presumed understanding of the 
underlying biology of the disorder (Manolio & Collins, 
2009). Past experience would suggest that best guesses 
for candidate genes affecting environmental sensitivity 
(i.e., cGxE) are unlikely to fare better than they have for 
other phenotypes investigated to date in GWAS. The 
combination of low prior likelihood of a given candidate 
being correct, compounded by likely small effect sizes 
and low power for any given truly associated variant, 
suggests that the false discovery rate — the proportion of 
"discoveries" in candidate gene main effect and cGxE 
studies that are actually false — may be unacceptably high 
(Duncan & Keller, 2011). 

In addition to the potential for low power and low 
prior probabilities associated with the study of candi- 
date genes (Munafo, 2009), it is also likely that there is 
insufficient correction for and underreporting of multiple 



testing. Publication bias is also probable, whereby authors 
are more likely to submit, and editors are more likely to 
accept, cGxE findings that are statistically significant. A 
recent report notes that problems contributing to and a 
consequence of such bias are rampant in the cognitive 
sciences (Ioannidis, Munafo, Fusar-Poll, Nosek, & David, 
2014), and there is similar evidence in the cGxE literature 
(Duncan & Keller, 2011). Specifically, the vast majority of 
first reports of a given cGxE finding were positive, 
whereas a much lower proportion of attempted replica- 
tions were positive. Ioannidis et al. (2014) referred to this 
as the Proteus phenomenon, whereby the rapid publica- 
tion of positive findings might temporarily create a halo 
in which negative findings might be more readily enter- 
tained by journal editors for a short period of time. 
Furthermore, and inconsistent with expectations based 
on statistical power, Duncan and Keller (2011) found that 
the larger the cGxE sample, the less likely it was to be 
significant. This trend would not be expected if cGxE 
findings were valid. These observations are consistent 
with widespread publication bias in the cGxE literature. 
Publication bias can arise when authors, editors, and 
reviewers believe that positive findings are more worthy 
of publication than are negative or null results. There are 
many uninteresting ways for an empirical test of a 
hypothesis to fail to support it — the hypothesis was 
implausible to start with, the power of the test is inade- 
quate, the operationalization of the variables is not valid, 
and so forth. As a consequence, the greater interest in 
positive results is understandable, especially when they 
bring an insightful increment to, or even transformation 
of, researchers' understanding. Yet, interest in positive 
findings is clearly misplaced if they are false, as they can 
(mis)guide research efforts and funding priorities. 
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These problems are not unique to the study of cGxE 
(Ioannidis et al., 2014; Spellman, 2012). Concern about 
unacceptably high false-positive rates in the social sci- 
ences has garnered growing attention over the past several 
years. In a provocative article by Ioannidis (2005) entitled 
"Why Most Published Research Findings Are False," sev- 
eral conditions are outlined that contribute to why a novel 
research finding ultimately may be in error. These condi- 
tions include smaller studies; smaller effect sizes; greater 
number and lesser preselection of tested relationships; 
greater flexibility in designs, definitions, outcomes, and 
analytic models; greater interests and prejudices surround- 
ing the area of research; and situations in which there are 
more scientists in a field involved in chase of statistical 
significance. We believe that all of these conditions are 
likely to contribute to findings in the study of cGxE. 

A more recent article compellingly demonstrated how 
flexibility in data collection, analysis, and reporting can 
dramatically increase false-positive rates (Simmons, 
Nelson, & Simonsohn, 2011). This recognition has led to 
a growing movement in the social sciences to adopt new 
practices to promote research integrity (Cumming, 2014), 
including prespecification of studies and hypotheses, 
avoidance of selection and other questionable data- 
analytic practices, complete reporting of analyses and 
variables, and encouragement of replication. A "new sta- 
tistics" method has been proposed that includes recom- 
mended statistical practices, such as estimation based on 
effect sizes, confidence intervals, and meta-analyses 
(Cumming, 2014). These are practices that have already 
become more widespread in the genetics field (Agrawal 
et al., 2012; Boraska et al., 2014; Ripke et al., 2013; 
Steinberg et al., 2014; Stephens et al., 2013; Thompson 
et al., 2014), in which a plague of inconsistent and non- 
replicable genetic main effects has led to the adoption of 
more rigorous statistical practices, which have proven 
successful in advancing the field (Corvin, Craddock, & 
Sullivan, 2010; Sullivan et al., 2012; van Assen, van Aert, 
Nuijten, & Wicherts, 2014). Following these guidelines 
would go a long way toward improving the quality and 
trustworthiness of the cGxE literature as well. 

In the following sections, we focus on practical prob- 
lems as they relate to cGxE. We focus this discussion on 
two major components of cGxE research — the core 
ingredients of the interaction (i.e., how G [gene] and E 
[environment] are measured) and the recipe for combin- 
ing them (i.e., statistical problems with modeling their 
interaction). 

The ingredients ofcGxE: The choice of 
genetic and environmental variables 

In the enthusiasm surrounding the study of cGxE, many 
investigators have expanded their studies to include mea- 
sures of G or E, when that was not the original focus of 



the study. This expansion into new areas has happened 
in both directions: Researchers who have focused on 
carefully characterizing environmental effects have 
expanded their studies to include measured genes, and 
researchers who have focused on gene finding have 
expanded their studies to include measures of the envi- 
ronment. In theory, this expansion of cross-disciplinary 
science is a positive development; however, an unfortu- 
nate corollary has been that the added component does 
not always represent the state of the science in the other 
respective field. 

The choice of G. For example, the vast majority of the 
G that has been incorporated into cGxE studies consists 
of a handful of "usual suspect" candidate genes (Munafo, 
2006; e.g., SLC6A4 [also known as 5-HTT, MAOA, DRD2, 
COM!]), as discussed earlier. Often a single genetic 
marker is genotyped to represent the gene. Genotyping a 
single marker in a gene does not reflect the state of the 
science in genetics, which has moved toward more com- 
prehensive approaches to gene finding. It may be appro- 
priate to genotype a single genetic variant when that 
variant has a known functional impact on the gene (i.e., 
it produces an observable alteration in the manner in 
which the gene encodes the protein product). However, 
above and beyond this simple annotation of function, the 
impact of a candidate polymorphism is very challenging 
to establish. At its simplest, even when modeling a single 
variant, the characterization of that variant (or its mode of 
inheritance) can profoundly impact detection of cGxE. 
For instance, which allele is assigned as a risk allele 
and how many copies of this allele are required to quan- 
tify the diathesis need to be determined. How to model a 
genotype becomes a particular concern with smaller 
samples, in which homozygotes (individuals who carry 
two copies of a given allele) of the minor allele are often 
combined with heterozygotes (individuals who carry 
one copy of a given allele) in the interest of statistical 
power, potentially obfuscating the complexity underlying 
the genetic model. Consider the consequences if one 
applied this practice to other known genetic outcomes: 
For example, researchers know that two copies of one of 
several of the CFTR mutations are required for the diag- 
nosis of cystic fibrosis (a disorder with a recessive mode 
of genetic inheritance) and that this is etiologically dis- 
tinct from the one copy of the HTT trinucelotide expan- 
sion required for a diagnosis of Huntington's disease (a 
dominant disorder). Imagine the confusion if a prenatal 
genetic counselor ascribed the same degree of vulnera- 
bility to a fetus that tests positive for one copy (a carrier, 
unaffected with disease) versus two copies (will manifest 
the disease) of the CFTR mutation. Although the action of 
genetic variants on complex traits is not monogenic, the 
technical specification of their purported mode of action 
is still significant. 
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Large-scale, gene-finding efforts have moved to sys- 
tematic screens of the genome (e.g., Sklar et al., 2011; 
Treutlein et al., 2009; Wray et al, 2012), pathway and 
network analysis (Weng et al., 2011), and integration 
with model organism genetics (Zhao et al., 2012) as 
new avenues for increasing the probability of identify- 
ing relevant genes. Further, when a gene of interest is 
being studied, genetic variants across the gene are usu- 
ally genotyped to capture the many locations across that 
gene that could be involved 2 in altering gene regulation 
or function and producing different effects on behavior. 
For example, many early studies genotyped the TaqlAl 
polymorphism in DRD2 as a hypothesized biological 
candidate for a variety of outcomes related to reward 
deficiency, for which dopamine transmission was 
thought to play a role (Blum et al., 1990; Meyers et al., 
2013; Young, Lawford, Nutting, & Noble, 2004). 
Subsequently, it was discovered that the polymorphism 
was actually located in the neighboring gene ANKK1 
(Neville, Johnstone, & Walton, 2004). Researchers using 
more systematic studies of genetic variation across the 
region containing ANKK1 and DRD2 have found evi- 
dence that multiple genes may be involved (Gelernter 
et al, 2006; Yang et al., 2007, 2008). Differences in the 
way that individual genes, gene networks, and whole 
genomes are systematically studied has led to increasing 
distance between the gene-finding world and studies of 
cGxE, which still focus largely on the usual suspect can- 
didate polymorphisms. Clearly, better integration of 
these research areas is necessary. 

The choice of E. An equally important issue concerns 
the choice of E. The challenges within this area are illus- 
trated in the literature examining life stress as an environ- 
mental risk factor for depression. In attempting to 
replicate the original cGxE results of Caspi et al. (2003) 
pertaining to life events, many investigators have incor- 
porated a wide variety of ad hoc measures of life stress 
(Monroe & Reid, 2008). Almost any form of adversity or 
challenge, at any time in a person's life, has been used as 
an alternative index of "stress." For example, replication 
studies on this cGxE topic have included participant life 
stress occurring over a range of time, from 1 month 
through a lifetime. Other researchers have adopted life 
event scales known to possess serious measurement defi- 
ciencies (Monroe, 2008) or simply have used unique 
measures never used previously. In one review of this 
literature, researchers reported psychometric properties 
of the stress measures in only 5 of 18 studies (Monroe & 
Reid, 2008). Finally, even when roughly common omni- 
bus measures of life events were adopted, how the life 
events were combined (over time, severity, and type) for 
the final index of stress varied greatly across investiga- 
tions (Uher & McGuffin, 2010). 



The use of measures of the environment with proven 
reliability, empirical precedent, and theoretical plausibil- 
ity is critical for advancing the field. The elasticity and 
looseness of the environmental construct can present 
serious problems for measurement and for the likelihood 
of detecting a cGxE (Monroe & Reid, 2008). In the exam- 
ple of stress, it is highly likely that different types of stress 
are relevant for different types of cGxEs for particular 
kinds of disorder or disease. For example, chronic stress 
over years may be most relevant for conditions that 
develop over protracted periods of time (e.g., coronary 
heart disease), whereas acute, avers ive life events may be 
most relevant for conditions that often appear to come 
on rather quickly (e.g., a major depressive episode). 
Additionally, the developmental timing of the stressor can 
be critical as the social and biological impact can be 
expected to vary as a function of stage of development. 
More specific theoretical dimensions associated with 
stress should accompany such temporal distinctions (e.g., 
loss or humiliation life events vs. life events conveying 
threat and danger vs. other types of adaptive challenges). 
In a very real sense, there should be candidate "stressors" 
proposed for particular conditions based on a theoretical 
understanding of the plausible underlying mechanisms. 
Viewed from an alternative perspective, no one would 
expect to find a "true" cGxE if the wrong gene was 
assessed for the particular outcome. In a similar manner, 
any potentially valid cGxE will go undetected if the 
wrong form of environment is assessed or if the right 
form of the environment is assessed poorly (Monroe & 
Reid, 2008). 

Problems with the recipe: Statistical 
concerns in cGxE research 

There are a number of statistical considerations that influ- 
ence the detection and interpretation of cGxE that have 
not received widespread attention in the literature. 

We highlight some of the most critical issues next, and 
many of our concerns are consistent with those of others 
who have recently written on this topic (e.g., Roisman 
et al., 2012; Zammit, Lewis, Dalman, & Allebeck, 2010; 
Zammit, Owen, & Lewis, 2010). 

The importance of scale. First, evidence for interac- 
tions can depend on choice of scale as well as choice of 
statistical model. Despite one's conviction in the pres- 
ence or absence of cGxE, interactions are statistical phe- 
nomena and only have meaning in the context of a 
specific statistical and measurement model. However, 
many behavioral measures have no "true scale." One 
might, for instance, argue that a construct such as height 
has a true scale and, moreover, has a ratio scale with a 
meaningful zero point and equal intervals between data 
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points (Stevens, 1946): A board of 2 feet is twice as long 
as a board that is 1 foot long. However, many, if not most, 
of the constructs in the behavioral sciences do not have 
meaningful zero points and, thus, are scaled somewhat 
arbitrarily. Quantitative scales without meaningful zero 
points can vary further as to whether they have meaning- 
ful intervals between measurement points or simply 
reflect differences in relative magnitude. This is true of 
measures of both behavioral outcomes (e.g., a depres- 
sion score) and environments (e.g., family function, peer 
deviance, neighborhood disintegration). The scale of 
measurement matters profoundly in interaction research 
because evidence for an interaction can change solely 
depending on arbitrary choice of scale (Eaves, 2006). For 
example, predictors that combine multiplicatively to 
influence the outcome variable will combine additively if 
the outcome is log-transformed. In such situations, the 
significance of the interaction term depends on how the 
outcome is scaled, which, in most behavioral research, is 
arbitrary. 

The selection of model. As with choice of transforma- 
tion, choice of how to model interactions can profoundly 
affect evidence for them. In particular, there has long 
been debate in epidemiology regarding the relative utility 
of risk differences versus risk ratios. That is, if the rate of 
illness is 10 and 50 per 10,000 in groups unexposed and 
exposed to some risk factor, are researchers more inter- 
ested in the risk difference (40 new cases per 10,000) or 
the risk ratio of 5? From a practical perspective (e.g., 
public heath impact, focus for possible prevention, advice 
to patients), the risk-difference approach has much to 
recommend it. However, the risk-ratio approach is the 
more dominant in large part because it is easily imple- 
mented statistically in logistic regression. This ciistinction 
is critical because it defines the baseline model from 
which researchers assess interactions. In a risk-difference 
framework, an interaction reflects a deviation from a 
model in which risk factors add together. In a risk-ratio 
framework, an interaction reflects a deviation from a 
model in which risk effects multiply. Accordingly, the 
usage of logistic regression to study cGxE for binary out- 
comes of interest (such as presence/absence of a disor- 
der) fundamentally changes the nature of the relationship 
between two variables: Multiplication on the original 
scale of a variable conforms to addition on the logarith- 
mic scale. Thus, an "interaction" on the original scale can 
"disappear" or even be of opposite sign on the logarith- 
mic scale, and vice-versa. We refer the reader to other 
sources (Kendler & Gardner, 2010; Zammit, Lewis, et al., 
2010) for further discussion of these important issues. 

The use of cross-product terms. Statistical tests of 
cGxE effects often rely on the modeling of a 



cross-product term in a regression-type model. Valid 
detection of true interactions in these models requires 
that factors that could produce spurious interactions be 
ruled out. For example, when predictors are correlated 
and quadratic terms are not modeled, the cross-product 
term can carry the variance of the unmodeled quadratic 
term and generate spurious interactions (Lubinski & 
Humphreys, 1990). Moreover, failure to include quadrat- 
ics can also result in false-negative findings of interac- 
tions or the reversal of sign of true interactions (Ganzach, 
1997). More generally, if the underlying relationship 
between G or E and an outcome is nonlinear (e.g., a 
spline or higher order polynomial), misspecification of 
the analysis by failing to include a term to model the 
nonlinearity can generate a significant interaction term in 
the absence of a true interaction. 

The use of a cross-product term can be particularly 
problematic for modeling three-level, categorical geno- 
types. In the standard practice of assuming an additive 
genetic model, the use of a cross-product term will force 
the slope difference to be the same among all genotypic 
groups (e.g., the difference in slope between people car- 
rying 0 vs. 1 copies of the risk allele is constrained to be 
the same as the slope difference between individuals 
who carry 1 vs. 2 copies of the risk allele). It also forces 
the lines for the three genotypic groups to all cross at the 
same point when an interaction exists. Accordingly, an 
interaction will only be accurately represented by the 
cross-product term when these conditions are met, and 
there is no a priori reason to assume that these con- 
straints are sensible. This means that the regression lines 
implied by the use of a cross-product term may not accu- 
rately reflect the interaction present in the data. Figure 3 
illustrates the problem and demonstrates how a reparam- 
eterization of the regression equation with parameters 
additional to the single cross-product term can correct it 
(as further delineated in Aliev, Latendresse, Bacanu, 
Neale, & Dick, 2014). 

The importance of covariates. Failure to properly 
control for potential confounds can also be problematic in 
cGxE research. In nonexperimental research, researchers 
typically enter potential confounding variables (e.g., gen- 
der, ethnicity socioeconomic status, genotype quality) 
into regression equations to control for their effects. How- 
ever, this approach controls only for the additive effects of 
covariates; it does nothing to control for the potential con- 
founding effects that these covariates might have on the 
interaction itself (Keller, 2014; Yzerbyt, Muller, & Judd, 
2004). To properly control for confounders in cGxE 
research, investigators must also evaluate all relevant 
Gene x Covariate and Environment x Covariate interac- 
tion terms. To date, virtually no cGxE researchers have 
appropriately controlled for all covariate interactions 
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Four Parameter Model Six Parameter Model 




Environment Environment 

Fig. 3- The figure presents simulated phenotypic data for three genotypic groups (Gene = 0, 1, 2, indicating groups of individuals who 
carry 0, 1, or 2 copies of a particular allele), each shown in a different color. The four-parameter model corresponds to the case in which the 
interaction term is modeled by a cross-product term only. Although a significant interaction is detected, the corresponding linear regression 
lines do not match the data points, and the slopes are incorrectly ordered from 0 to 1 to 2 on the basis of the constraints imposed by the use 
of the cross-product tenia to model the interaction. Thus, although the model would produce a significant interaction, the regression lines 
implied by the model inaccurately represent the data and would be misleading as to the nature of the interaction. The data can be accurately 
reproduced by an extended parameterization of the regression model (six-parameter model) as detailed in Aliev et al.'s (2014) study. 



(Keller, 2014). This failure to include covariates is particu- 
larly concerning in mixed-ethnicity samples, in which 
stratification can not only produce spurious genetic 
main effect association to be detected (Price, Zaitlen, 
Reich, & Patterson, 2010) but can also cause Ethnicity x 
Environment interactions to appear as spurious Gene x 
Environment interactions. This is because the frequency 
of alleles naturally varies across ethnic populations and, 
in the presence of a coincidental excess of affected indi- 
viduals belonging to one ethnic group, spurious associa- 
tions and interactions with polymorphisms of no 
functional consequence, except a degree of natural ethnic 
variation, may emerge. 

Power to detect and characterize different types of 
interactions. Yet another concern is the low power to 
detect most plausible forms of interactions in the first 
place (McClelland & Judd, 1993) in observational field 
studies as compared with experimental studies in which 
independent variables can be efficiently manipulated. 
Under many conditions, theoretically meaningful inter- 
actions are likely to be quite small, accounting for 1% of 
the outcome variance, and the power to detect most 
plausible interactions will be quite limited without a 
large N. 



Further, even if an interaction is detected, discerning 
the true pattern of an interaction from observed results 
is even more tenuous. In recent years, there has been 
great interest in determining the form of the observed 
interaction in cGxE research because the interpretation 
of disordinal (i.e., cross-over) interactions theoretically 
differs from ordinal interactions. Specifically, cross-over 
interactions lend themselves to a differential susceptibil- 
ity interpretation in which a given "risk" or "malleable" 
allele is associated with both poorer outcomes in a "bad" 
environment but better outcomes in a "good" environ- 
ment; ordinal interactions lend themselves to a diathesis- 
stress interpretation in which it is the combination of a 
risk-conferring allele and a bad environment that exacer- 
bates the likelihood of manifesting the outcome (e.g., 
Belsky, Bakermans-Kranenburg, & van Ijzendoorn, 2007; 
Belsky et al., 2009). However, simulations demonstrate 
that, conditional on a Type I error, the form of an osten- 
sibly "significant" interaction is usually of a cross-over 
(i.e., disordinal) nature, especially when samples sizes 
are small (Sher & Steinley, 2013). Boardman et al. (2014) 
recently made a similar observation in reference to the 
emerging genome-wide, gene-by-environment approach 
(Cornells et al., 2012; Mukherjee, Ahn, Gruber, & 
Chatterjee, 2012; Thomas, Lewinger, Murcray, & 
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Gauderman, 2012) by demonstrating that when many 
interaction tests are performed, the most significant p val- 
ues will come from disordinal interactions even when 
such interactions are generated from random data. 
Boardman et al. (2014) noted that these findings "con- 
form to the differential susceptibility model but will not 
tell us anything meaningful about the way in which envi- 
ronments systematically moderate genetic factors . . . 
because they will likely be a statistical artifact" (p. 123). 

Moreover, even true ordinal interactions that are sta- 
tistically significant can appear to be of a cross-over 
type (Sher & Steinley, 2013) because of random error. 
All linear interactions imply a cross-over at some point, 
even if outside the range of observed values. This is not 
a trivial issue because a typical practice is to plot values 
+1 standard deviation and -1 standard deviation above 
and below the mean of the moderator (Aiken & West, 
1991). However, -1 standard deviation can represent 
values that rarely or never exist in nature for skewed 
predictors. We note that some authors (Roisman et al., 
2012) have recently recommended extending Aiken and 
West's (1991) guidelines to ±2 standard deviations to 
provide 95% coverage of the observed values. Given the 
highly skewed nature of many environmental expo- 
sures, attention to the underlying distribution of all 
study constructs is necessary so as not to generate mis- 
leading regression plots covering regions of sparse or 
imaginary data. 

Recently, techniques for estimating the standard error 
of the cross-over point have been proposed that could, in 
principle, allow stronger inferences about the actual form 
of the interaction ("Widaman et al., 2012). Alternatively, 
establishing significant regions around each regression 
line with standing approaches (e.g., Johnson & Neyman, 
1936) to characterize where two slopes overlap and 
where they do not could also be used to increase confi- 
dence that an ostensible cross-over shows a desired 
degree of statistical differentiation from an ordinal inter- 
action. Such approaches, in principle, could provide 
greater confidence in believing a true cross-over has 
been detected. Consistent with the other points made 
earlier, such approaches depend on being confident that 
the interaction is not an artifact of scaling, is not caused 
by (unmodeled) nonlinearity, and is not a Type I error. 

cGxE versus gene-environment correlation 
(rGE). The term rGE refers to instances in which expo- 
sure to environment is nonrandom and correlated with 
genetic vulnerability, 3 whether through passive, active, or 
evocative processes. 4 For instance, in classical behavior 
genetics, rGE is represented by genetic factors that influ- 
ence the outcome (e.g., alcohol and tobacco use) and the 
environment (e.g., peer relationships; Harden et al., 
2008) as indexed by a genetic correlation. Similarly, in 



measured gene studies, presence of rGE is indexed by 
variations in genotype or allelic frequency as a function 
of the environmental exposure. For instance, Salvatore 
et al. (2014) reported an association between a polygenic 
score for alcohol problems and peer deviance, indicating 
that individuals who are at genetic risk for alcohol prob- 
lems are also more likely to have deviant peer groups. It 
is likely that for many outcomes both rGE and cGxE may 
be important; however, the presence of rGE can compli- 
cate the interpretation of cGxE. Researchers using behav- 
ioral genetic and twin models implement several statistical 
approaches to account for and even explicitly model rGE 
(e.g., Eaves & Erkanil, 2003; Purcell, 2002; van der Sluis, 
Posthuma, & Dolan, 2012). In measured gene studies, the 
first step in testing for potential rGE involves estimation 
of a correlation between genotype and environment. In 
the absence of such a correlation, as has been noted for 
5-HTTLPR and stressful life events, cGxE testing can 
proceed without concern. If the correlation is solely 
attributable to outliers in the environmental measure, 
removal or "Winsorization may eliminate rGE (e.g., Bog- 
dan, Williamson, & Hariri, 2012). A modest correlation 
between genotype and environment may require more 
careful consideration. For instance, in revisiting the inter- 
action between a polymorphism in the MAOA gene and 
exposure to childhood physical abuse in the develop- 
ment of antisocial behaviors (Caspi et al., 2002), Kim- 
Cohen et al. (2006) examined whether the MAOA 
genotype was correlated with not only exposure to abuse 
(i.e., evocative rGE) but also to maternal antisocial behav- 
ior (i.e., passive rGE), with the latter being a key correlate 
of transmission of risk for antisociality and for increased 
likelihood of exposure to abuse. There was evidence for 
the latter, whereby maternal antisocial behavior was cor- 
related with offspring exposure to abuse; however, the 
effect of the interaction persisted even after accounting 
for this effect. Alternatively, Salvatore et al. (2014) 
accounted for rGE by residualizing both their polygenic 
score and their environmental measures (parental knowl- 
edge and peer deviance) for each other prior to testing 
for cGxE in the etiology of alcohol use problems. The 
interaction between parental knowledge and the poly- 
genic scores remained significant; however, the interaction 
with peer deviance was no longer significant, indicating 
the possibility of both rGE and cGxE for the former but 
rGE alone for the latter. Therefore, although both mecha- 
nisms of gene-environment interplay (rGE and GxE) may 
be at work, testing for rGE is necessary before conclusions 
regarding GxE are made. "When rGE is presented, methods 
to account for it should be implemented. In some instances, 
relevant data may not be available (e.g., availability of 
parental phenotypes to test for passive rGE), or the corre- 
lation may be complex and mediated by other unmea- 
sured factors. In such instances, the possibility that rGE 
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Recipe for GxE 



Genes: 

1. Why this gene? 

2. Why these variants? 

3. Why this biological model? 

4. Just one SNP/gene, or polygenic? 



Environment: 

1. Why this environment? 

2. Why this measure (include associated psychometric 
information)? 

3. Justification for the scaling (robust to scale transformations?) 



Statistical issues: 

1 . Sensible transformations of the predictors or outcome variables should be tested and reported to understand the robustness of 
the interaction. 

2. Decision of whether to model the outcome as a risk ratio or a risk difference should be justified. 

3. If alternative measures of the environmental or outcome variable exist, the choice of measures should be justified. 

4. The number of alternative models that were fit, the results of these tests should be presented. 

5. Theoretically important gene-by-covariate and environment-by-covariate interaction terms should be included in the model, 
regardless of their statistical significance. 

6. The significance of the interaction should be tested for robustness in a model that includes quadratic terms of the genetic and 
environmental variables. 

7. The correlation between the genetic and environmental variables should be reported. 

8. When model-derived data are being used to graphically illustrate an interaction, these should be accompanied by co-plotting the 
underlying raw data or representative distributional information to verify that the statistical model is a valid representation. This 
is especially important when covariates are in the model. 

9. Power computations should be presented (regardless of positive/negative findings) using reasonable effect sizes. 



Fig. 4. Recommendations for rigorous gene-environment interaction (GxE) research practices. SNP = single nucleotide 
polymorphism. 



may contribute to the relationship between genotype and 
environment should be acknowledged. 

In summary, although the statistical approaches for 
modeling interactions are well established, having confi- 
dence in the statistical validity of an interaction requires 
due diligence on the part of the investigator. These include 
attention to scaling issues, characterizing the underlying 
linearity of the relationships under investigation and 
determining whether nonlinear models are necessary, 
controlling for relevant confounders including various 
forms of rGE, and ensuring plotted results are not unduly 
influenced by the constraints imposed for rendering an 
easy-to-interpret graph. Perhaps the greatest challenge is 
to minimize the likelihood that an observed interaction is 
not a Type I error given that various data sets have a large 
number of candidate Gs and candidate Es; there is consid- 
erable flexibility in approach to analysis; and under most 
plausible conditions, power to detect GxE is likely to be 
low. Many of the issues described earlier (as well as some 
others) are described by Roisman et al. (2012), who pro- 
vided a list of thoughtful guidelines for addressing various 
issues, such as characterizing whether an obtained inter- 
action is of a cross-over type (and to the extent that the 
magnitude of the cross-over is meaningful), the problem 
of nonlinearity, and Type I errors. 



Recommendations 

Although the list of challenges associated with character- 
izing cGxE is long, many of these can be addressed by 
adopting a handful of rigorous research practices. Later 
in the article, we delineate a series of recommendations 
that we believe will help advance the study of GxE and 
ensure that the literature provides meaningful progress 
for science. We focus at greatest length on the issues per- 
taining to the G in GxE, under the assumption that this 
will be most unfamiliar to social scientists. Many of the 
other concerns, reviewed in less detail later in the article, 
are not specific to the study of cGxE but complement the 
broader discussion (Cumming, 2014; Ioannidis, 2005; 
Lakens & Evers, 2014; Simmons et al., 2011) in the social 
sciences about how to produce robust, replicable find- 
ings that advance science. The recommendations 
described in this article are reviewed in Figure 4. 

Selection of genes 

Given the prevailing skepticism surrounding candidate 
gene research, the burden of proof for the selection of a 
candidate gene is high. Such a rationale should be con- 
vincingly articulated in a manner specific to the 
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phenotype and environment under study. The crux of the 
argument for selection of a particular gene to study lies 
in the statistical prior probabilities for the gene, that is, on 
the basis of prior evidence and the quality of the source 
of that evidence, how likely is it that this is a robust can- 
didate. There is nothing inherently wrong with studying 
candidate genes, though the very idea of candidate gene 
research has fallen out of favor because of the historical 
issues with studying hypothesized biological candidates 
that have not held up in more systematic well-powered 
studies, as reviewed earlier. There are notable cases in 
which hypothesized biological candidates have shown 
robust associations with outcome. For instance, rsl229984 
in the ADH1E gene was one of the earliest candidate 
gene variants proposed in the etiology of alcoholism, 
particularly in Asians (Agrawal & Bierut, 2012). 5 The vari- 
ant, which is rather rare in European American popula- 
tions, has recently been identified in adequately powered 
GWAS as well (Bierut et al., 2012). Similarly, early candi- 
date gene and experimental studies implicated a SNP in 
a nicotinic receptor (rs 16969968 in CHRNA5) as a risk 
factor for tobacco smoking and nicotine dependence, but 
the role of this variant in cigarette smoking was not 
widely accepted until it was identified in multiple meta- 
analyses of cigarettes smoked per day, the largest of 
which had a sample size exceeding 70,000 (J. Z. Liu 
et al., 2010; Thorgeirsson et al., 2010; Tobacco and 
Genetics Consortium, 2010). 

Knowing that the prior probabilities for a candidate 
gene selected on hypothesized biological rationale is 
low, there are a variety of other methods for selecting 
candidate genes for study that should produce more 
robust and reliable findings, including focusing on genes 
with either large main effects or stronger a priori evi- 
dence. For the latter, methods of gene selection that have 
a greater likelihood of producing meaningful cGxE 
results include focusing on candidates suggested by well- 
powered GWAS or meta-analyses, or by model organism 
work, ideally with replication. Identification of genotypes 
of substantial main effect is challenging and limited. For 
example, the s2/ s3/ s4 polymorphism at the Apolipoprotein 
E locus is a known and important risk factor for 
Alzheimer's and coronary heart disease (Corder et al., 
1993; Ward et al., 2009). Such large-effect polymorphisms 
seem to be particularly compelling candidates for cGxE 
research because the genetic effect is already known; 
only an environmental modification of this effect is 
required for evidence for an interaction. 

A systematic strategy of gene identification followed 
by efforts to characterize moderation of the effect associ- 
ated with that gene can be found in the example of 
GAERA2 and moderation by parental monitoring. 
GAERA2 was originally identified by the Collaborative 
Study on the Genetics of Alcoholism, the largest gene 
identification project in the area of alcohol dependence 



(Begleiter et al., 1995), by systematically interrogating 
gamma-aminobutyric acid receptor genes with evidence 
for involvement in ethanol-related responses (Harris, 
1999) that were located in a region of linkage identified 
with both clinical alcohol dependence phenotypes and 
electrophysiological endophenotypes (Ghosh et al., 2003; 
Porjesz et al., 2002; Reich et al., 1998). Association 
between GABRA2 and alcohol dependence was subse- 
quently reported (Edenberg et al., 2004) and replicated 
by multiple independent research groups around the 
world using a variety of research designs (Covault, 
Gelernter, Hesselbrock, Nellissery, & Kranzler, 2004; 
Enoch, Schwartz, Albaugh, Virkkunen, & Goldman, 2006; 
Fehr et al., 2006; Lappalainen et al., 2005; Olfson & 
Bierut, 2012; Soyka et al, 2008). Although there have 
been failures to replicate (Covault, Gelernter, Jensen, 
Anton, & Kranzler, 2008; Drgon, DAddario, & Uhl, 2006; 
Matthews, Hoffman, Zezza, Stiffler, & Hill, 2007), a recent 
meta-analysis has confirmed the evidence for association 
(Li et al., 2014). In addition, translational researchers have 
found the role of GABRA2 in rodent drinking (Dixon, 
Walker, King, & Stephens, 2012; J. Liu et al., 2011) and in 
the brain's response to alcohol-related (Kareken et al., 
2010) and monetary reward cues (Villafuerte et al., 2012). 
Although GABRA2 SNPs have not been identified via 
GWAS, they typically have the lowest p values of candi- 
date polymorphisms extracted from GWAS data (Olfson 
& Bierut, 2012). Interaction between GABRA2 and paren- 
tal monitoring was tested on the basis of the twin litera- 
ture suggesting that parental monitoring moderates the 
relative importance of overall genetic effects (as inferred 
on the basis of comparisons of twins, not with measured 
genotypes) on substance use outcomes in adolescence 
(Dick, Pagan, Viken, et al., 2007; Dick, Viken, et al., 2007); 
genetic effects assumed greater importance under condi- 
tions of lower parental monitoring, presumably because 
adolescents with lower monitoring have more access to 
the substance and opportunity to express their genetic 
predisposition. An interaction between GABRA2 and 
parental monitoring was tested in an independent sam- 
ple; a stronger association between the gene and trajec- 
tories of externalizing behavior was found under 
conditions of lower parental monitoring, as was hypoth- 
esized on the basis of the twin findings (Dick, Latendresse, 
et al., 2009). This series of studies illustrates how differ- 
ent literatures and study designs (linkage and association 
studies; twin studies; and longitudinal, developmental 
studies) were integrated to characterize a cGxE effect. In 
this case, the candidate gene under study was selected 
on the basis of a series of converging pieces of evidence 
that indicated involvement in alcohol and externalizing 
outcomes before it was studied in the context of cGxE. 

In short, not all candidate genes are created equal, and 
there is not a single pathway to determining their viabil- 
ity in a cGxE study. The burden is on the researcher to 
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provide a compelling argument for the study of a particu- 
lar candidate. Furthermore, we suggest that the bar 
should be much higher than the field has insisted on to 
date. 

Selection of genetic variants 

In addition to strong justification for the selection of the 
gene under study, a second area that should be justified 
is the genotyping strategy: If only a select number of 
polymorphisms in a gene are being genotyped, how and 
why were those selected? There are several methods for 
selecting polymorphisms — one can utilize SNP content 
from a preexisting GWAS array or genotype custom con- 
tent individually or en masse (e.g., as offered by the 
Illumina Golden Gate technology; Hodgkinson et al., 
2008). Prices for GWAS arrays, especially those designed 
to include custom content, have dropped consicierably, 
and often it is far more expensive and less cost-efficient 
to genotype a small number of polymorphisms than to 
genotype in large scale. In some cases, the technology 
required to genotype a particular variant, especially one 
that is not a SNP (e.g., a variable number of tandem 
repeats), is specialized and may entail unique require- 
ments, as most commercial arrays and custom genotyp- 
ing platforms may not include this. Nonetheless, if only a 
few SNPs can be genotyped, tagging a gene may be pref- 
erable to simply pursuing the usual suspects. Tagging 
refers to identifying all variation, regardless of function, 
that captures variation across the gene. This includes 
important regulatory regions, such as promoters and 
enhancers, that are increasingly recognized as key con- 
tributors (Zannas & Binder, 2014). Reliance on simple 
annotation of "function" (i.e., typically a nonsynony- 
mous exonic variant, meaning that the location is known 
to be in a part of the gene that produces an alteration in 
the gene's protein product) is short-sighted, as modern 
annotations available via the identification of epigenetic 
marks along the genome suggest that even intronic vari- 
ants can have a profound impact on genomic action (e.g., 
Ziller et al., 2013). An exciting and upcoming possibility 
is a highly cost-efficient chip being designed by the 
Psychiatric Genomics Consortium that will include cus- 
tom common and rare variation and will be based on 
SNPs nominated by expert consensus and validated via 
meta-analytic methods. 

Modeling genes 

Another consideration is that how genotype is coded 
implies a biological model, as reviewed earlier, and, thus, 
the model should be specified and justified (or explicitly 
stated as exploratory, with appropriate corrections for 
multiple testing). Genotypes should not be collapsed 
purely to increase power, and if they are, then effects 



should be described across all genotype groups for com- 
pleteness. With increasing recognition that individual 
genetic polymorphisms on their own are likely to have 
very small effects, there should be justification provided 
if single polymorphisms are being studied in isolation. 
Further, with growing knowledge about gene networks 
and integrated functional pathways, there is opportunity 
to think more broadly about the potential for incorporat- 
ing gene networks or pathways into tests of GxE, allow- 
ing one to go beyond focusing on a single gene and 
instead focus on sets of genes that interact biologically. 
This brings its own set of complications, as decisions 
must be made about the nature of genetic effects across 
the pathway or network: Is a mutation in any of the genes 
in the network sufficient? Are all variations within the 
network expected to have an equal effect on outcome? 
Are mutations across multiple genes in the network act- 
ing cumulatively to affect outcome? Similar questions can 
be asked about multiple variants within any given gene 
of interest. Should polygenic risk scores that capture 
risk across the genome be used? The answers to these 
questions depend on the investigator's theory behind 
how the environment is operating: Does the investigator 
believe that all genes involved in outcome should be 
moderated by that environment in a parallel fashion or 
only subsets of the relevant genes? There is no straight- 
forward answer to these questions, but what is clear is 
that deeper thought must be given to these issues to 
move the study of cGxE forward. Justification for the 
choices made in any given GxE study should be included 
in the publication. Because it is challenging to keep up 
with advances in the field of genetics, we suggest that 
this is an area in which collaborations between geneti- 
cists and psychologists can be particularly fruitful, as con- 
nection to the latest findings from statistical and 
psychiatric genetics about the rapidly evolving knowl- 
edge of the underlying genetic basis for a given outcome 
of interest can ensure that the genetics being integrated 
into psychological science represents the latest advances 
from the field of genetics. The annual meetings of the 
Behavior Genetics Association (www.bga.org) and the 
International Society for Psychiatric Genetics (www.ispg 
.net) are opportunities to learn about the latest advances 
in psychiatric and behavioral genetics and to potentially 
develop collaborations with scientists working directly in 
those areas. In addition, Text Box 1 of this review (in the 
Supplemental Material) lists a variety of genetics resources 
that may be useful to investigators in the social sciences 
that are interested in adding an informed genotyping 
component to their study. 

Selection of the environment 

Investigators should provide theoretical rationale for the 
selection of the environmental factor under study. Why is 
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there reason to believe that this aspect of the environ- 
ment will have a moderating effect? Does the investigator 
believe this is an environmental factor with a time-limited 
or enduring effect? Justification for the scale of measure- 
ment of the environment should be provided as well as 
reporting of results on biologically defensible transforma- 
tions of the independent and dependent variables. 
The environmental measure should reflect the "state of 
the art" method for measuring the particular feature of 
the environment. If it does not, justification for why this 
particular measure should be relevant/adequate should 
be included as well as traditional supportive psychomet- 
ric information. 

Accurate reporting of multiple testing 

Researchers should explicitly detail how many total poly- 
morphisms were available to them, how many were 
tested, and what types of guarding against inflation of 
Type I errors were made. In an identical vein, researchers 
should explicitly detail how many environmental vari- 
ables, and how many different methods of operational- 
izing these environmental variables, were considered, as 
well as transformations of such, and selection of models. 
These types of procedures for genes are now routine in 
GWAS but are rare in cGxE studies. Such explicit disclo- 
sures will help increase confidence in positive findings 
that survived proper multiple testing corrections. 

Power and small samples 

Investigators should demonstrate that the sample being 
used has adequate power to detect an interaction effect 
for the variables under study. Power computations should 
be presented regardless of the nature of the finding. In 
other words, researchers that produce studies with posi- 
tive findings should be encouraged to present power 
computations as well (K. S. Button et al., 2013). 
Computations should be specific to the statistical tech- 
nique, distributions of the variables under investigation, 
and the hypothesized form of the interaction. Power 
analyses should assume realistic cGxE effect sizes. GWAS 
suggest that main effects are typically of a small magni- 
tude, most accounting for less than 1% of the variance in 
a psychiatric phenotype. If the investigator has reason to 
believe that a larger effect size is likely for their study, this 
should be clearly spelled out and justified. For instance, 
one might hypothesize that genes exert stronger effects 
on endophenotypes (e.g., neuroimaging outcomes) that 
are, arguably, more proximal to their action (though see 
Flint & Munafo, 2007; Munafo & Flint, 2009). Or, it is pos- 
sible that the use of a phenotypic measure thought to be 
of much greater reliability and validity than existing clini- 
cal phenotypes could enhance power. Finally, 



investigators should use meta-analysis and/or integrative 
data analysis (e.g., Hussong, Curran, & Bauer, 2013) to 
combine data across multiple studies so that small-scale 
research can contribute to more definitive results. 

A question that sometimes arises is what can and should 
be done with studies whose small sample size is necessi- 
tated by the prohibitive costs associated with measurement 
of the phenotype, for example, with behavioral or neuro- 
imaging studies, multiwave longitudinal studies, or studies 
of special populations. Regardless of the nature of the vari- 
ables (e.g., self-report vs. neuroimaging), a small sample 
size nearly always implies reduced power. In fact, K. S. 
Button and colleagues (2013) reported that the median sta- 
tistical power across 49 meta-analyses of neuroscience 
studies was typically less than 20%, with the estimate plum- 
meting to 8% when examining neuroimaging studies 
alone — and that is without estimation of G or GxE (K. S. 
Button et al, 2013). Some have argued that genotypic effect 
sizes associated with endophenotypes (such as neuroimag- 
ing outcomes) are likely to be higher because the outcome 
is more proximal to the biological substrate (E. J. Rose & 
Donohoe, 2013); yet, there is no demonstrable evidence 
that researchers understand the genetic underpinnings of 
threat-related amygdala reactivity and habituation any bet- 
ter than they do the etiology of depression and anxiety. In 
other words, endophenotypes may be as polygenic as self- 
report measures (Flint & Munafo, 2007). 

Replication is key for findings based on small samples, 
but, as Button and colleagues have noted, the winner's 
curse often inflates initial results, and unless the replica- 
tion sample is substantially better powered, the ceiling 
placed on average sample size likely perpetuates false- 
positive findings until enough samples have been 
amassed to conduct a meta-analysis (K. S. Button et al., 
2013). Meta-analysis and integrative data analysis (Hussong, 
Curran, & Bauer, 2013) are particularly attractive tech- 
niques to combine data across multiple studies so that 
small scale research can contribute to more definitive 
results. In instances in which that is impossible — perhaps 
because the sample is rare and unique, because the out- 
come under study is highly novel, or because the envi- 
ronmental factors have been measured with superior 
assessments that other studies do not include — we rec- 
ommend that researchers acknowledge their limitations 
and present a candid account of power in the study, even 
if it implies that their findings might be spurious. Readers 
of this literature should carefully consider the caveats of 
such small sized studies so as not to perpetuate a series 
of cGxE emerging from and replicated in small samples. 

Manipulation/presentation of data 

The statistical properties that surround the detection of 
the interaction should be specified and discussed. Is the 
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interaction significant on an additive or multiplicative 
scale? Investigators should provide evidence that detected 
interaction effects are robust to transformation of scale 
and should provide a strong defense of the scaling used. 
If it is not robust to transformation, the implications of 
this should be discussed. Common artifacts associated 
with nonlinearity should also be evaluated and ruled out. 
Similarly, if the interaction is detected under some condi- 
tions or with some measures of the environment but not 
others, this should be reported and incorporated into the 
theory behind what the findings contribute to the litera- 
ture. Perhaps most important, the raw data along with 
95% confidence intervals should always be plotted and 
reported in the publication, not just the model-derived 
regression lines, which can be misleading, especially 
when some combinations of the independent variable 
have small sample sizes and when control covariates are 
included in the model. Because apparent cGxE effects 
can be due to stratification, genotyping artifacts, and 
even gender (Keller, 2014), careful attention must be paid 
to the variables under study to bolster confidence that 
the moderation effect is due to the specific gene/environ- 
ment under study. For example, if White individuals are 
more sensitive to environmental trauma, leading to post- 
traumatic stress disorder, in a mixed ethnicity sample 
examining environmental Trauma x Gene interactions, 
any SNP that differentiates White individuals from Black 
individuals will also show an apparent cGxE. In this case, 
the gene is not the true moderator — race is; the gene was 
merely correlated with race. To increase confidence in 
cGxE findings and to eliminate alternative explanations 
for them, researchers need to include all relevant Gene x 
Covariate and Environment x Covariate interaction terms 
in their models. Researchers should consider including 
quadratic transformations of the environmental term as 
well as covariates, as these may change the interpretation 
of the interaction depending on multi-collinearity. Finally, 
the assumptions that are imposed by modeling interac- 
tions with cross-product terms in linear regression or by 
the use of logistic regression (as discussed earlier) should 
be acknowledged and, when appropriate, justified. 

Addressing publication bias 

In circumstances in which the proportion of false findings 
is high, as we suspect is the case for candidate gene main 
effect and cGxE studies, the issue of publication bias 
should be considered by both authors and editors. Authors, 
journal editors, and reviewers should understand that 
novel positive findings may often be false, and so they 
should set higher standards of evidence, including stricter 
standards of significance testing, full accounting for all 
sources of multiple testing, and direct independent repli- 
cation prior to publication. Equally, investigators should 



strive to submit, and editors to publish, adequately pow- 
ered, independent direct replication attempts, irrespective 
of the outcome — positive, negative, or null. Meta- or mega- 
analyses that combine data across multiple studies while 
accounting for heterogeneity allow smaller sized studies to 
contribute to hypotheses necessitating larger samples, 
such as cGxE. Such collaborative research should be 
encouraged. These kinds of recommendations, aimed at 
reducing the role of publication and other biases, are 
being adopted by journals that have an interest in cGxE; 
some recent examples are Behavior Genetics (Hewitt, 
2012), the Journaloj 'Abnormal Child Psychology (Johnston, 
Lahey, & Matthys, 2013), and Drug and Alcohol Dependence 
(Munafo & Gage, 2013). We would recommend that other 
journal editors whose journals deal with substance abuse, 
psychiatry, and related fields adopt similar policies. 

Conclusions 

Studying how genetic predispositions and environmental 
circumstances come together to contribute to complex 
behavioral outcomes has great potential for advancing the 
understanding of the development of psychopathology. It 
represents a clear theoretical advance over studying these 
factors in isolation. However, research at the intersection 
of multiple fields creates many challenges. Studying cGxE 
requires appropriate understanding of genetic mecha- 
nisms, appropriate measurement of the environment, as 
well as a conceptual framework for integrating the two 
with respect to a specific outcome of interest and, criti- 
cally, of the statistical principles that underpin cGxE stud- 
ies. It is not likely, or expected, that every investigator 
conducting cGxE research will have the requisite exper- 
tise in all of these areas; accordingly, we encourage cGxE 
studies that are collaborative efforts involving individuals 
with common interests but diverse expertise. 

The National Institutes of Health has initiatives aimed 
at addressing challenges associated with genotypic 
research, many of which are also relevant to the study of 
cGxE. For example, in recognition of the fact that gene 
identification requires large numbers, but one of the 
challenges that is often encountered as researchers 
attempt to pool their samples is the use of different mea- 
sures across studies, the PhenX initiative was launched 
(www.phenX.org; Hamilton et al., 2011). PhenX brought 
together panels of experts across a variety of research 
areas to come up with recommended consensus mea- 
sures (including both outcome and environmental mea- 
sures) for inclusion in genetics studies to encourage the 
use of common measures to facilitate cross-study com- 
parisons and analyses. There are limitations to this 
approach: Brief, low-burden measures were preferen- 
tially selected to encourage more widespread uptake, 
which may result in less precise or comprehensive 
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assessments in the case of some constructs. However, it 
represents a step toward facilitating collaborative efforts 
in genetics research. The use of standardized measures 
across studies could also help advance the cGxE field, 
with greater emphasis placed on replication and com- 
bined analyses across research groups to enhance sam- 
ple size and corresponding power. The Office of 
Behavioral and Social Sciences Research at the National 
Institutes of Health also has resources on its website 
(http://obssr.od.nih.gov/) to help social scientists with 
the incorporation of genetic information into their stud- 
ies. Many of these issues are not specific to the study of 
psychopathology. In a recent article on GxEs in cancer 
epidemiology, coming out of a National Cancer Institute 
think tank, Hutter, Mechanic, Chatterjee, Kraft, and 
Gillanders (2013) described similar challenges. 

As investigators who have explored cGxE hypotheses 
ourselves, with some of our own work not meeting the 
standards delineated in this review, we were compelled 
to ask "how can we do better?" We hope we have out- 
lined some such strategies. Through greater awareness of 
the challenges in conducting cGxE research, resources 
available to aid in conducting high-quality cGxE studies, 
and proactive efforts to move cGxE studies in this direc- 
tion, it is our hope that this growing area of research will 
eventually reach its potential to deeply inform to the 
understanding of complex behavioral outcomes. 
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Notes 

1 . Words in bold font in the text are defined in the Supplemental 
Material. 

2. This is often called LD-tagging and indicates that you can 
use information about the LD or correlation structure across 
variants within and across genes to know how many genetic 
variants you need to genotype to capture most of the variable 
locations in the gene that could be associated with outcome 
(see Text Box 1 in the Supplemental Material for software). 

3. Mendelian randomization is a related and novel conceptual- 
ization of genotype representing environmental exposure pro- 
pensity and might explain cGxE in the presence of rGE. We 
refer the reader to Smith (2010) to learn more. 

4. Passive rGE is more common during childhood and adoles- 
cence and refers to individuals being differentially exposed to 
an environment without their own initiative, most likely because 
aspects of their environment are provided by their parents with 
whom they also share genetic variance. For instance, antisocial 
parents may pass on a genetic liability to antisociality and expose 
their children to an abusive environment. Evocative rGE refers 
to environments that an individual elicits/evokes from others on 
the basis of his or her genetic predisposition. For instance, an 
antisocial adolescent may evoke harsher parenting. Finally, as an 
individual matures into adulthood, the importance of active rGE 
increases. Here, an individual actively selects his or her environ- 
ment, or niche, on the basis of characteristics of his or her genetic 
predisposition. For example, antisocial youths may select into 
high-risk neighborhoods or affiliate with deviant peers. 

5. The ADH1B polymorphism (rsl229984) has been implicated 
in Asian populations to afford protection against the develop- 
ment of alcoholism, putatively via flushing (facial reddening 
due to accelerated conversion of ethanol to acetaldehyde); 
rsl229984 was not originally detected in GWAS on European 
American populations. The minor allele frequency of this vari- 
ant in European Americans is low (<5%), and commercial 
GWAS arrays rely on common variation. Accordingly, the SNP 
was neither genotyped on these arrays nor could it be reliably 
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imputed. However, genotyping this polymorphism resulted in 
genome -wide significant association signals (Bierut et al., 2012), 
and with the advent of more recent arrays that target this vari- 
ant better, there is now evidence in GWAS of an association 
between rsl229984 and alcoholism at p = 1.2 x 10" 31 (Gelernter 
et al., 2014). In this instance, GWAS did not initially guide iden- 
tification of a logical and validated candidate gene, highlighting 
that all techniques have their strengths and weaknesses. 
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